An intelligent framework to measure the effects of COVID-19 on the mental health of medical staff

The mental and physical well-being of healthcare workers is being affected by global COVID-19. The pandemic has impacted the mental health of medical staff in numerous ways. However, most studies have examined sleep disorders, depression, anxiety, and post-traumatic problems in healthcare workers during and after the outbreak. The study’s objective is to evaluate COVID-19’s psychological effects on healthcare professionals of Saudi Arabia. Healthcare professionals from tertiary teaching hospitals were invited to participate in the survey. Almost 610 people participated in the survey, of whom 74.3% were female, and 25.7% were male. The survey included the ratio of Saudi and non-Saudi participants. The study has utilized multiple machine learning algorithms and techniques such as Decision Tree (DT), Random Forest (RF), K Nearest Neighbor (KNN), Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). The machine learning models offer 99% accuracy for the credentials added to the dataset. The dataset covers several aspects of medical workers, such as profession, working area, years of experience, nationalities, and sleeping patterns. The study concluded that most of the participants who belonged to the medical department faced varying degrees of anxiety and depression. The results reveal considerable rates of anxiety and depression in Saudi frontline workers.


Introduction
The World Health Organization discovered at the end of 2019 that pneumonia was driven on by an unforeseen factor in Wuhan city, in the Chinese province of Hubei. At the beginning of 2020, World Health Organization declared the disease a pandemic due to the worldwide psychological symptoms [15]. In [16], 19.6% of respondents claimed to have experienced anxiety ranging from moderate to severe at the time of the COVID-19 epidemic. Higher depression levels were associated with being a female student, living with a COVID-19-susceptible family member, being unmarried or separated, and being a student.
In [17], 2081 inhabitants and citizens of Saudi Arabia were examined to see how the pandemic affected their mental health. 7.3% of the respondents indicated that they experienced anxiety, according to the findings. The researchers also concluded that non-Saudis, single parents, senior citizens, and college students were the most likely to suffer from depression during the epidemic. A higher level of worry was associated with Saudi individuals who are married, unemployed, and wealthy.
In this study, we have analyzed the survey that was conducted to collect information from the health line workers of Saudi Arabia. The study aims to predict the levels of anxiety and prediction by using the data provided in the survey. The purpose of this study is to determine the prevalence of mental symptoms and highlight the factors that are causing anxiety and depression in healthcare workers. The contribution of this paper is as follows: • The critical impression of the study is the comprehensive review of the health workers' data and the impact of the COVID-19 outbreak on their mental health.
• The proposed study utilized a number of machine learning algorithms, and the results that are being carried out from these models are satisfying.
• The machine learning models utilize the sample data and predict the risk factors that are more likely to contribute to causing depression and anxiety among healthcare workers.
• The results may assist the government agencies and Healthline Professionals in assessing the contributing factors for the depression and anxiety of the medical staff.
The rest of the paper is organized as follows; section 2 shows the related work, section 3 represents the materials and method of the proposed work, section 4 displays the findings of proposed algorithm on the dataset, and finally concludes the paper in the conclusion section.

Methodology
In order to assess the concern of health workers, past behavior, knowledge, and attitude were observed to decide the psychological impact of COVID-19 on their mental health. That is why similar questions were asked during the data collection in Saudi Arabia. Several general questions were included in the survey covering different aspects such as modes of COVID-19, symptoms, transformation, and several prevention metrics. The dataset uses a scale to measure the anxiety and depression level of the respondent. The study uses classification techniques and different relevant algorithms such as DT (Decision Tree), RF (Random Forest), KNN (K nearest neighbor), GB (Gradient Boosting), LightGBM (Light gradient boosting machine), and XGBoost (extreme gradient booster) to predict the impact of COVID-19 on the psychological state of Health Care Workers (HCW) as shown in Fig 1. The results are carried out in Precision, Recall, F1-score, and Accuracy.

Dataset details
The utilized dataset in the study is obtained from the given link (https://www.ncbi.nlm.nih. gov/pmc/articles/PMC8418795/bin/peerj-09-12119-s001.sav). The dataset contained numerous gaps, and the data quality was not satisfactory to apply a machine-learning algorithm. The quality of data directly impacts the quality of the model. As the dataset contains different types of information, it is likely for vital information to be amidst the noise in the vast feature space.
As the dataset contains numerical and categorical data, it is transformed into the most appropriate structure to collect maximum accuracy. Several data scrubbing techniques and feature engineering methods are being applied on datasets to encode non-numeric data, deal with missing values, handle outliers and transform data in a format acceptable by the algorithm. Data cleaning aims to develop a dataset that answers research questions and can give appropriate results. The categorical features in the dataset are separated from the other columns. The results showed that there are 58 categorical features in the dataset. The findings then proceed with different quartiles, means, and maximum values. The feature info with the detail is obtained and exhibits 57 non-null columns in the dataset. The data is then proceeded to preprocessing, and feature engineering is applied to shape the data in a format that makes a model building can be possible.
The study used a questionnaire dataset that analyzed the behavior of different respondents who participated in the survey. The survey includes participants from different clinical departments and units of hospitals, including intensive care units, emergency departments, and general wards. The survey used a convenient technique in the selection of participants. The survey also explained its purpose prior to the participants via electronic mail so that they could show a keen interest in the activity and tend to share the correct information. The respondents were also given the reliability of asking questions before contributing to the survey. The utilized dataset is the result collected by the response of different people who work in different departments of healthcare centers and have served during the period of COVID-19. The data was analyzed statistically, and the Likert scale base question assisted in summarizing the analytics. Physicians and non-physicians are separated in the dataset with the help of Fisher's exact test. In order to assess the impact of different columns in the prediction, the study undergoes classification analyses considering the type of health care facility, ID, Age, Age record, gender, nationality, profession, work area, year of experience, sleeping disorder before COVID-19, mental disorder, sleeping pills, trouble to stay awake, enthusiasm, sleep quality, bed partner, loud snoring, log pause, leg twitching, disorientation, anxiety, level of anxiety, depression, and level of depression.
The demographics also analyze the age of the participants. The dataset has explained three professions of health care workers: physicians, nurses, and others as shown in Fig 2(a). The demographics of the dataset show that the ratio of nurses and midwives involved in the process is the highest at 50.6%. The dataset has also depicted the working area of the participants. The columns of working units take into account four working areas: ER, ward, ICU, and others. The demographics show that the ratio of acute care units (ER and ICU) is 44.8%. In contrast, other working areas, such as general hospital floors, auxiliary services, outpatient clinics, and academics, show 19.4%, 4.5%, 28%, and 3.3%, respectively.
The dataset considers the ratio of both genders who responded to the survey. A pie graph is plotted to analyze the ratio of male and female respondents in the survey as shown in Fig 2(b). The ratio of female respondents in the survey is high in comparison to male respondents. The Blue area depicts the female participants, while the red area depicts the male participants. Around 74.3%, 483 of the females have responded to the survey, and the detail of age, profession, year of experience, and working area is added to the dataset. While 25.7%, 127 of the males, have participated in the survey, the age group, working experience, profession, and other credentials can be seen in the dataset.
A pie chart is plotted to analyze the response rate based on nationalities. The red part in the pie chart depicts the ratio of Saudi participants, while the blue area exhibits the ratio of non-Saudi participants as depicted in Fig 2(c). The ratio for non-Saudi participants is at 70.9%, which is 461. In contrast, the ratio for Saudi participants is comparatively low at 29.1%, which is 189.
Similarly, a pie chart is plotted to analyze the number of age groups included in the dataset as described in Fig 2(d). The response rate for 31 to 40 years is the highest. A total of 447, 68.8%, have participated in this age group. The second highest age group in the dataset is 20 to 30 years old. Of 123, 18.9% have been included in this age group. Similarly, the age group of 41 to 50 also gave satisfactory results. The response rate from this age group is 9.69% which is 63 in number. The respondents from the age scale above 50 are the lowest. Only 2.62%, 17 in number, have participated in the survey.

Decision tree
Due to visualization simplicity, decision trees are considered well-known machine learning algorithms. A tree is constructed in the decision trees, which are divided into different nodes. These nodes help in figuring out the hierarchy of the model and the overall importance of features at first glimpse. A dataset is broken down into smaller pieces in the decision tree. These smaller pieces consist of statistics with a higher homogeneity. As the dataset contains different features, the constructed decision tree will be of different kinds with different depths for the training model. However, building an optimal decision tree with the shortest length is computationally expensive and can predict the class label of any unseen or new data [18].
The decision tree has three nodes: root node, leaf node or terminal node, and internal node. The class label is the leaf node in the decision tree classifier. The feature test conditions fall into the category of a leaf or terminal nodes. The hierarchy of non-terminal nodes plays a crucial role in order to build an optimal decision tree. Decision tree assists in feature selection in this study as it works perfectly to predict the significance of each attribute. A decision tree helps split the sample appropriately, which is helpful in order to reduce the entropy of the dataset sample after that subset. The entropy of the sample is given as follows: Where Pi denotes the entropy of class S., A simple decision tree is used in the mental health dataset to give the accuracy. It also helps analyze the model's most efficient predictor [19].

Random forest
Random forest is an ensemble technique that generates various classifiers and combines their output. Random forest assists in creating different classification and regression trees (CART).
Original training data is used to train each CART sample, and splits are determined with the help of a random search across a subset of input variables. A node is split into a child node repeatedly and generates the CART, the binary decision tree. The splitting is initiated with the root node, consisting of the entire learning sample [20]. The random forest classifier performs by casting a vote against each input. Most of the selected inputs in the random forest will be considered as the outcome. Random forest helps in the ensemble of a large number of trees, and they can manage highly dimensional data [21]. The following features of the random forest have helped in the proposed study: • The random forest has an efficient feature to estimate the missing values in the dataset.
• Random forests help balance the imbalanced data in the data set with its significant feature of weighted random forest (WRF).
• The random forest also highlights the importance of variables used in the classifier.

K nearest neighbor
K nearest neighbor (KNN) is famous among several machine learning algorithms because of its simplicity. It focuses on the concept that if objects are near each other, they probably share similar interests. It helps in classifying the object with the majority of the votes. The object is then kept in the class that is the most relevant among KNN. K is the small integer in the method and is usually upbeat. The algorithm measures the distance from the new data point to all the known data points. This unique observation belongs to the nearest class in the neighboring set. KNN is the best choice when there is little or no hint about primary data. However, suppose the dimension of the feature vector is high. In that case, KNN becomes computationally expensive as it needs to keep knowledge of all the data points and their relevant distance for the new data observation [22].

Gradient boosting
Many individual decision tree models are combined to build a gradient tree-boosting model. The error made by the previous trees can be minimized by fitting the trees, and decision trees are collected by adding one at a time. The differentiable loss functions are minimized in this model with the help of gradient or stochastic gradient descent. The gradient boosting algorithm is preferred over several other machine learning algorithms in the tabular dataset. For the gradient-boosting ensemble model, the total amount of decision trees involved in the process is one of the crucial hyperparameters. The ensemble model improves the algorithm's performance with the efficient combination of the total number of trees and the depth of the trees. Gradient boosting helps augment multiple weak tree-based classifiers in machine learning and builds superior and highly efficient results from these classifiers. On the other hand, it is also different from the random forest in functionality, as the random forest utilizes precious data to train the trees in sequential order. The gradient boosting model generated new trees at each stage of the model, which aims to reduce the error made by the previous tree. It ensures the improvement of the accuracy of the model. As a non-linear algorithm, it is helpful to outperform the datasets with a high relationship compared to linear algorithms. There are several other types of gradient boosting involving XGboosting, CAT boost, and LightGBM, but all these types differ in the mechanism and implementation of the boosted trees [23].

Light gradient boosting
Light Gradient Boosting Machine (LightGBM) is a popular learner mainly applied in boosting frameworks. Unlike CatBoost, it sequentially trains the data. Gradient-based one-side sampling (GOSS) is used in LightGBM because it ensures the optimal balance between enhanced speed by minimizing the number of data points. It holds the overall preciseness of the learned trees in this process. The ability to work with one-side sampling based on a gradient and exclusive feature combining highlights the performance of LightGMB from CatBoost and XGBoost. GOSS utilizes a slight gradient to rescue a considerable proportion of data samples. The rest of the data instances predict the information gain in single decision trees. LightGBM has proved that large data instances hold a significant role in the computational process of information gain. GOSS achieves a more accurate prediction of the information gained with the smaller data size. Exclusive feature bundling helps in reducing the number of features. Nonzero values are rarely included in mutually exclusive features, such as the single hot encoding feature. According to LightGBM, it IS NP-hard to figure out the best combination of unique features. A greedy algorithm such as LightGBM can assist in achieving a better estimation ratio. It works without affecting the overall model's accuracy and also helps reduce the features [24].

Xtreme gradient boosting
XGBoost or extreme boosting is the further implementation of gradient boosting algorithms. Extreme gradient boosting is considered the outperforming model when utilized in supervised learning. It is the most widely used algorithm as it performs in both classification and regression techniques. The computational behavior of XGBoost is out of the core, and its execution speed is considerably high compared to other algorithms [25]. Both single and distributed systems can be used in XGBoost as it supports parallel processing. It is widely used for large datasets as it efficiently manages the memory exceeding RAM. The number of regularizations in XGBoost helps in reducing overfitting. It also assists in determining a specific size of the decision tree and supports tree pruning, which means a decision tree will not grow after a specific limitation. XGBoost is also helpful in estimating missing values in the dataset. Initially, XGBoost was utilized in the machine learning models and GBM to enhance the training time. Several multiclass classification difficulties can be solved by utilizing a parallel tree-boosting framework. Instead of typical training and testing, it prefers cross-validation, considered a well-known technique for finding optimal accuracy. The data is shuffled randomly in the cross-validation method. The dataset is divided into k groups, each considered for testing data. The rest of the data is used to learn the model to find better accuracy. The remaining data also assist in measuring and delivering the k-fold average of accuracy. Leave one outcross validation is obtained by assigning the number of k to the number of observations [26].

Results
The dataset was collected with the assistance of 71.8% of Health Care Workers (HCWs) who helped in gathering information. Among the respondents, 75% were female, and a majority of 62.4% belonged to the nursing profession. The machine learning algorithms' performance was evaluated using precision, recall, and F1-score as indicators. Precision helped to determine the accuracy of the prediction made by the model by measuring how close the predicted value was to the actual value. On the other hand, recall indicated the usefulness of the model in predicting positive samples. A higher recall value meant the model could detect more positive samples. Table 1 shows the prediction values for depression detection using different machine learning models. The proposed machine learning model included Decision Tree, Random Forest classifiers, K nearest neighbor, and LightGBM, which had the highest recall values. Accuracy was used to assess how similar the predicted value was to the actual value. F1-score was used as a measure of the harmonic mean of precision and recall. The binary classification system can be evaluated with the help of the F1-score. Higher F1-score means the model is performing generally better. In the proposed machine learning model, the Decision tree, Random Forest Classifier, K nearest Neighbor, and LightGBM have the highest F1-score. These algorithms outperform the research and offer the most satisfactory results. The precision rates for the decision tree are above 97%, which is reasonably satisfactory. Random forest classifier and K nearest Neighbor are also successful in providing the most favorable outcomes, such as 95% rates for precision, recall, and F1-score, respectively. It is sometimes challenging to predict results with gradient-boosting algorithms. Training takes longer in gradient boosting classifiers as they are prone to overfitting and intensive to the resources. However, in this research, Light Gradient Booting Classifier has offered a considerable outcome of 99%. LightGBM outperforms all the other algorithms and gives the best rates for accuracy. In addition to depression detection, the study also explored predicting anxiety levels in health workers. The dataset collected participants' anxiety and depression levels during and after the Covid-19 pandemic, where anxiety was defined as a feeling of fear and loss of control under a particular situation. The results showed that health workers had a high level of anxiety during the pandemic, while depression signs were still evident even after the pandemic period. Table 2 presents the prediction results for anxiety levels using six different algorithms. The Light Gradient Boosting Classifier had the highest accuracy among the six algorithms, with nearly 99% accuracy in predicting both depression and anxiety levels. Higher precision, recall, and F1-score values indicated higher prediction rates. The Random Forest Classifier and K Nearest Neighbor also offered similar results, with an accuracy rate of almost 98%. The Decision Tree and Extreme Gradient Boosting Classifier had a result of 1, while the F1-score ranged between 0 to 1, with values closer to 1 indicating better results for prediction. The overall accuracy rate of the model was above 99%, making it suitable for further use.
Performance measures help in analyzing the overall results of the machine-learning model. The amalgamation of results is represented in the confusion matrix of different algorithms, which visually presents findings. Four primary classification attributes are being utilized as performance measures in this study: Accuracy, Precision, Recall, and F1-Score are described in  by adding both to get results. Precisin Recall To better understand the performance of the classification algorithm, the confusion matrix was utilized in the study as shown in Figs 3 and 4. The accuracy values can be misleading if the dataset contains more than one class or unequal observations. The confusion matrix highlights the errors and provides a clear idea of the classification model's results. The matrix summarizes the predicted results and indicates the accuracy percentage of the classification model. Although the matrix has several values, its main purpose is to identify where the machine learning models went wrong. The confusion matrix is created with two axes, with the test values of the dataset on the y-axis and the scale used to predict results for the dataset on the x-axis. The scale indicates three levels of intensity: mild, moderate, and severe. The machine learning algorithms predict different classes of the dataset, and the confusion matrix is drawn for both anxiety and depression classes. The confusion matrixes for anxiety and depression using algorithms such as Decision Tree, Random Forest, K Nearest Neighbor, Gradient Boosting, Extreme Gradient Boosting, and Light Gradient Boosting. The three-level scales depict the intensity of anxiety as severe, moderate, and mild.

Anxiety classification reuslts
For classification of anxiety, Fig 3 is explaining the performance of proposed algorithms in the form of confusion matrix. We have three anxiety classes (mild, moderate, and severe). There are 92 instances of the mild class, 64 instances of the severe class, and 59 instances of the moderate class. The diagonal values represent the number of instances that are correctly classified for each class by the proposed algorithms. In order to further evaluate the performance of the model, additional metrics such as precision, recall, and F1-score are used to get a more comprehensive understanding of its performance as shown in Table 2.
Decision Tree, XGB, and Gradient Boosting algorithms are correctly classified 92 instances of the mild class, 59 instances of the moderate class, and 64 instances of the severe class. The number of wrong predictions of decesion tree, XGB and Gradient Boosting is zero, it means that all the instances in the dataset are classified correctly.
For KNN and random forest algorithms, there are 92 instances of the mild class, 59 instances of the moderate class, and 61 instances of the severe class that are correctly classified by these algorithms. It is important to note that the number of correctly classified instances for the moderate class is lower than that of the other two classes. There maybe an imbalance issue in moderate class. Similarly LightGBM shows that there are 90 instances of the mild class, 64 instances of the severe class, and 59 instances of the moderate class that are correctly classified. The results for LightGBM are also satisfactory.   moderate class. The diagonal values represent the number of instances that are correctly classified for each class by the proposed algorithms. In order to further evaluate the performance of the model, additional metrics such as precision, recall, and F1-score are used to get a more comprehensive understanding of its performance as shown in Table 1.

Depression classification reuslts
The number of wrong predictions of decesion tree, XGB and Gradient Boosting is zero, it means that all the instances in the dataset are classified correctly.
Similarly, the confusion matrix for random forest and KNeighbors indicates that there are 63 instances of the mild class and 143 instances of the severe class that are correctly classified by the random forest and KNeighbors. However, the lower triangle value of 8 for the moderate class indicates that there are 8 instances that belong to the moderate class but were misclassified as belonging to another class by the random forest and KNeighbors. This means that random forest and KNeighbors are performing well but there is a class imbalance issue at moderate level.
Similarly LightGBM shows that there are 64 instances of the mild class, 143 instances of the severe class, and 7 instances of the moderate class that are correctly classified. The results for LightGBM are also satisfactory.
The results of this study suggest that machine learning algorithms can be valuable in predicting mental health outcomes among healthcare workers during pandemics. The findings also highlight the need for mental health support for healthcare workers, especially in addressing the high levels of depression they face during pandemics.

Conclusion
The COVID-19 pandemic has resulted in a significant impact on the mental health of healthcare workers globally. They are now more susceptible to depression and anxiety. In this study, we aimed to identify the relationship between the psychological impact of COVID-19 and several sociodemographic factors experienced by frontline health workers in Saudi Arabia. The dataset used in this study was collected through an online survey of medical health professionals. It consisted of 75 columns, out of which 58 were categorical columns. To analyze the dataset, we employed various machine learning algorithms, including decision tree, random forest, kNeighbors, gradient boosting, extreme gradient boosting, and light gradient boosting. Our analysis revealed that decision tree, gradient boosting, and extreme gradient boosting achieved 100% correct classification results for both anxiety and depression detection. However, random forest and kNeighbors misclassified instances with mild anxiety and moderate depression. Similarly, LGBM misclassified two instances of mild anxiety and one instance of moderate depression. Despite these misclassifications, all the algorithms' accuracy rates are above 99% for anxiety and depression detection. Our results showed considerable rates of anxiety and depression among Saudi front-line workers. Our study's intended purpose is to inform policy-makers about the importance of healthcare professionals' mental health condition.